reduction algorithm
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT
Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks.Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that $(k+1)$-core of a graph $G$ suffices to compute its $k^{th}$ persistence diagram, $PD_k(G)$. Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%. The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.28)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (5 more...)
Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT
Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks.Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that (k 1) -core of a graph G suffices to compute its k {th} persistence diagram, PD_k(G) . Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices.
TemporalPaD: a reinforcement-learning framework for temporal feature representation and dimension reduction
Mu, Xuechen, Huang, Zhenyu, Li, Kewei, Zhang, Haotian, Wang, Xiuli, Fan, Yusi, Zhang, Kai, Zhou, Fengfeng
Recent advancements in feature representation and dimension reduction have highlighted their crucial role in enhancing the efficacy of predictive modeling. This work introduces TemporalPaD, a novel end-to-end deep learning framework designed for temporal pattern datasets. TemporalPaD integrates reinforcement learning (RL) with neural networks to achieve concurrent feature representation and feature reduction. The framework consists of three cooperative modules: a Policy Module, a Representation Module, and a Classification Module, structured based on the Actor-Critic (AC) framework. The Policy Module, responsible for dimensionality reduction through RL, functions as the actor, while the Representation Module for feature extraction and the Classification Module collectively serve as the critic. We comprehensively evaluate TemporalPaD using 29 UCI datasets, a well-known benchmark for validating feature reduction algorithms, through 10 independent tests and 10-fold cross-validation. Additionally, given that TemporalPaD is specifically designed for time series data, we apply it to a real-world DNA classification problem involving enhancer category and enhancer strength. The results demonstrate that TemporalPaD is an efficient and effective framework for achieving feature reduction, applicable to both structured data and sequence datasets.
- Asia > China > Jilin Province (0.14)
- Asia > China > Guizhou Province (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- (2 more...)
Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging
Deng, Yangfan, Albidah, Hamad, Dallal, Ahmed, Yin, Jijun, Mao, Zhi-Hong
Sleep is crucial for human health, and EEG signals play a significant role in sleep research. Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges. To address these issues, we propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to improve the performance of dimensionality reduction. Inspired by topological data analysis, which can analyze the structure of high-dimensional data, we extract topological features from the EEG signals to compensate for the structural information loss that happens in traditional spectro-temporal data analysis. Supported by the topological visualization of the data from different sleep stages and the classification results, the proposed features are proven to be effective supplements to traditional features. Finally, we compare the performances of three dimensionality reduction algorithms: Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). Among them, t-SNE achieved the highest accuracy of 79.8%, but considering the overall performance in terms of computational resources and metrics, UMAP is the optimal choice.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Attribute reduction algorithm of rough sets based on spatial optimization
Rough set is one of the important methods for rule acquisition and attribute reduction. The current goal of rough set attribute reduction focuses more on minimizing the number of reduced attributes, but ignores the spatial similarity between reduced and decision attributes, which may lead to problems such as increased number of rules and limited generality. In this paper, a rough set attribute reduction algorithm based on spatial optimization is proposed. By introducing the concept of spatial similarity, to find the reduction with the highest spatial similarity, so that the spatial similarity between reduction and decision attributes is higher, and more concise and widespread rules are obtained. In addition, a comparative experiment with the traditional rough set attribute reduction algorithms is designed to prove the effectiveness of the rough set attribute reduction algorithm based on spatial optimization, which has made significant improvements on many datasets.
- North America > United States (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Asia > Japan > Honshū > Chūgoku > Yamaguchi Prefecture > Yamaguchi (0.04)
Pharmacoprint -- a combination of pharmacophore fingerprint and artificial intelligence as a tool for computer-aided drug design
Warszycki, Dawid, Struski, Łukasz, Śmieja, Marek, Kafel, Rafał, Kurczab, Rafał
Structural fingerprints and pharmacophore modeling are methodologies that have been used for at least two decades in various fields of cheminformatics: from similarity searching to machine learning (ML). Advances in silico techniques consequently led to combining both these methodologies into a new approach known as pharmacophore fingerprint. Herein, we propose a high-resolution, pharmacophore fingerprint called Pharmacoprint that encodes the presence, types, and relationships between pharmacophore features of a molecule. Pharmacoprint was evaluated in classification experiments by using ML algorithms (logistic regression, support vector machines, linear support vector machines, and neural networks) and outperformed other popular molecular fingerprints (i.e., Estate, MACCS, PubChem, Substructure, Klekotha-Roth, CDK, Extended, and GraphOnly) and ChemAxon Pharmacophoric Features fingerprint. Pharmacoprint consisted of 39973 bits; several methods were applied for dimensionality reduction, and the best algorithm not only reduced the length of bit string but also improved the efficiency of ML tests. Further optimization allowed us to define the best parameter settings for using Pharmacoprint in discrimination tests and for maximizing statistical parameters. Finally, Pharmacoprint generated for 3D structures with defined hydrogens as input data was applied to neural networks with a supervised autoencoder for selecting the most important bits and allowed to maximize Matthews Correlation Coefficient up to 0.962. The results show the potential of Pharmacoprint as a new, perspective tool for computer-aided drug design.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points and show that the low dimensional representation of the data by the first few eigenvectors of the corresponding Markov matrix is optimal under a certain mean squared error criterion. Furthermore, assuming that data points are random samples from a density p(x) e-U (x) we identify these eigenvectors as discrete approximations of eigenfunctions of a Fokker-Planck operator in a potential 2U (x) with reflecting boundary conditions. Finally, applying known results regarding the eigenvalues and eigenfunctions of the continuous Fokker-Planck operator, we provide a mathematical justification for the success of spectral clustering and dimensional reduction algorithms based on these first few eigenvectors. This analysis elucidates, in terms of the characteristics of diffusion processes, many empirical findings regarding spectral clustering algorithms.